ABSTRACT This statement defines how NetWare features like HotFix and read-after-write verification behave in general, the extent of a VADD's role in these features, and how the LANStor ESDI and SCSI VADDs, in particular, implement these features. Included are some comments on DGroup memory-space usage and the Adaptec 2322b ESDI controller. The word "verification" will be used instead of the longer phrase "read-after-write verification". HOTFIX If a VADD runs into a problem, it notifies NetWare and describes the problem as either a controller error or a media error. For a controller error, NetWare will retry the operation several times before shutting down the drive and any other drives that are participating on the same channel. HotFix is not invoked because NetWare assumes that if the controller is failing, reads and writes performed by HotFix would fail as well. The HotFix feature is invoked when a VADD reports a media error; the controller is working, but for some reason the data could not be stored or retrieved. Write operations are handled simply; a new block (8 512-byte sectors) from the redirection pool replaces the bad block. To HotFix a failed read operation, NetWare recovers as much of the 8-sector block as possible by doing single-sector reads. Any missing data is taken from the mirror drive, if it is available. Usually media errors are detected by the drive's controller. The VADD consults the controller and, in turn, notifies NetWare. READ-AFTER-WRITE VERIFICATION Theory... The verification feature supplements the fault-tolerent nature of NetWare. Each time data is written, it is read back and verified for accuracy. If the verification fails, NetWare is notified and the HotFix feature is invoked (because the VADD reports the failure as a media error). The intent of this feature is to guarantee that data can be moved reliably from server memory to the disk-system and recorded properly. Two aspects of the disk-system are checked by this feature; media reliability and communications (cable) integrity. Sometimes an area of a disk's medium can be written successfully, yet fail when read at a later time. By reading and comparing the data after writing, "soft" errors of this type can be reduced. Some disk-system hardware can perform this function by themselves, independent of a VADD, O/S or anything else. The cables used in a disk-system are no less important than the disk drives themselves. To check their reliability, data must make a complete circuit from server memory to the disk-system and back again. It is crucial that data be brought back into server memory so it can be compared to the original. It is for this reason that disk-system hardware that perform verification themselves only do half the job unless the data is somehow brought back into server memory. Reality... The higher-quality disk-systems can be counted upon to either record data properly or provide notification if they cannot. Any "soft" errors are usually defeated with an arsenal of ECC-recovery, retries with skew adjustments and relocation of bad sectors; all handled automatically by the disk-system hardware. Communications problems with the cables are usually dealt with when a new server is installed. It either works or it doesn't. Maybe the cable and controller are too close to some noisy circuit board or there is a short in the cable or something like that. But for the most part, once good communication is established with the disk-system, it will remain that way unless the server is moved or outfitted with different boards. So, depending on your comfort-level, the verification feature is either burdensome overhead or a degree of extra insurance in case of intermittent noise problems. LANStor READ-AFTER-WRITE VERIFICATION With the introduction of NetWare 2.1x, the O/S no longer performed the verification automatically. It became the responsibility of the VADD to do the verification. Novell directed all VADD implementors to include this feature, and no exceptions were permitted. The Novell literature does state that the O/S performs the verification. Such a statement is not inaccurate if the writer viewed VADDs and LAN drivers as part of the O/S. The first LANStor VADDs always did verification, which involves an extra read following a write. We later made it an option because other vendors had excluded this feature from the very start and we simply didn't perform as well when compared to them in benchmarks. If you choose to use the verification feature, LANStor will, during its initialization, reserve one 4096-byte buffer for each NetWare channel upon which the VADD is loaded. LANStor AND DGROUP USAGE The DGroup data area in NetWare is a 64k chunk of memory somewhere in the file server. The O/S, LAN drivers, VADDs and VAPs and the machine stack all share this region. All third-party implementors were advised by Novell to curtail their use of the DGroup area, for DGroup is practically all used up even before one starts linking in third-party software. It is for this reason that all LANStor VADDs use only about 10 bytes of the DGroup area. LANStor does require memory, but it doesn't take it from DGroup. Instead, LANStor dynamically asks NetWare for the memory that it needs. The memory is in the form of segments that are privately owned by LANStor; there are no conflicts with any other process. A concern that we had early on was the machine stack. It is located in DGroup and we were told it is pretty small to begin with. It gets reduced in size as third-party software is linked with NetWare. All VADDs that we have seen run with interrupts disabled. These VADDs can easily control their stack usage, reducing the chance for a stack overflow. LANStor, on the other hand, runs with interrupts enabled. We do this so that LAN boards, printers, and other VADDs can do their thing while we do ours. Overall system throughput is increased, along with the risk of stack overflows, since nested interrupts can occur. LANStor avoids this by switching to its own local (and larger) stack whenever it gets control. ADAPTEC 2322B ESDI CONTROLLER This controller lacks the ability to perform the read-after- write verification function. It does have a read-ahead feature, which is a different thing altogether. The read- ahead function is simply used to optimize potential sequential reads. There appears to be a problem with the read-ahead feature, though this is not verified. This is why Storage Dimensions suggests that read-ahead be disabled when using this controller. d